AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Satoshi Koide, Keisuke Kawano, Takuro Kutsuna

Neural Edit Operations for Biological Sequences

Neural Information Processing SystemsFeb-14-2026, 16:02:21 GMT

Neural Information Processing Systems http://nips.cc/

accuracy, architecture, regular expression, (16 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)

Neural Information Processing SystemsFeb-11-2026, 14:25:41 GMT

SupplementaryMaterial

A sitting is a meeting of parliament members. While in the virtual environment, you will need to install the specific Gensim1 version needed for theCompassapproach. Inotherinstances,thebeginning of the line that specifies the speaker consists of the role of the parliament member, for example "SPEAKEROFTHEPARLIAMENT" (meaning the member of parliament presiding), followed, but not always, by the actual full name of the person in parenthesis. Theidisa unique number we assigned to each file. Themainchallenge of translating the files from Greek to English was the conversion of the Greek alphabetic numeralstoindo-arabicnumerals.

artificial intelligence, machine learning, natural language, (18 more...)

Country:

Europe > Greece (0.05)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Florida > Hillsborough County > Tampa (0.04)
(2 more...)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Kembu, Vignesh Kumar, Morandini, Pierandrea, Ranzini, Marta Bianca Maria, Nocera, Antonino

Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case

arXiv.org Artificial IntelligenceDec-5-2025

Large Language Models (LLMs) have become a key topic in AI and NLP, transforming sectors like healthcare, finance, education, and marketing by improving customer service, automating tasks, providing insights, improving diagnostics, and personalizing learning experiences. Information extraction from clinical records is a crucial task in digital healthcare. Although traditional NLP techniques have been used for this in the past, they often fall short due to the complexity, variability of clinical language, and high inner semantics in the free clinical text. Recently, Large Language Models (LLMs) have become a powerful tool for better understanding and generating human-like text, making them highly effective in this area. In this paper, we explore the ability of open-source multilingual LLMs to understand EHRs (Electronic Health Records) in Italian and help extract information from them in real-time. Our detailed experimental campaign on comorbidity extraction from EHR reveals that some LLMs struggle in zero-shot, on-premises settings, and others show significant variation in performance, struggling to generalize across various diseases when compared to native pattern matching and manual annotations.

large language model, machine learning, natural language, (20 more...)

2512.04834

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsNov-20-2025, 23:01:41 GMT

Neural Edit Operations for Biological Sequences

biological sequence, name change, neural edit operation, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Satoshi Koide, Keisuke Kawano, Takuro Kutsuna

Neural Edit Operations for Biological Sequences

Neural Information Processing SystemsNov-20-2025, 20:12:44 GMT

The evolution of biological sequences, such as proteins or DNAs, is driven by the three basic edit operations: substitution, insertion, and deletion.

artificial intelligence, machine learning, natural language, (20 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)

arXiv.org Artificial IntelligenceNov-17-2025

Automata-Based Steering of Large Language Models for Diverse Structured Generation

Luan, Xiaokun, Wei, Zeming, Zhang, Yihao, Sun, Meng

Large language models (LLMs) are increasingly tasked with generating structured outputs. While structured generation methods ensure validity, they often lack output diversity, a critical limitation that we confirm in our preliminary study. We propose a novel method to enhance diversity in automaton-based structured generation. Our approach utilizes automata traversal history to steer LLMs towards novel structural patterns. Evaluations show our method significantly improves structural and content diversity while maintaining comparable generation efficiency. Furthermore, we conduct a case study showcasing the effectiveness of our method in generating diverse test cases for testing open-source libraries.

artificial intelligence, large language model, natural language, (16 more...)

2511.11018

Country:

Europe (0.67)
North America > United States (0.46)
Asia > China (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Dingfelder, Philipp, Riess, Christian

Sure! Here's a short and concise title for your paper: "Contamination in Generated Text Detection Benchmarks"

arXiv.org Artificial IntelligenceNov-13-2025

Large language models are increasingly used for many applications. To prevent illicit use, it is desirable to be able to detect AI-generated text. Training and evaluation of such detectors critically depend on suitable benchmark datasets. Several groups took on the tedious work of collecting, curating, and publishing large and diverse datasets for this task. However, it remains an open challenge to ensure high quality in all relevant aspects of such a dataset. For example, the DetectRL benchmark exhibits relatively simple patterns of AI-generation in 98.5% of the Claude-LLM data. These patterns may include introductory words such as "Sure! Here is the academic article abstract:", or instances where the LLM rejects the prompted task. In this work, we demonstrate that detectors trained on such data use such patterns as shortcuts, which facilitates spoofing attacks on the trained detectors. We consequently reprocessed the DetectRL dataset with several cleansing operations. Experiments show that such data cleansing makes direct attacks more difficult. The reprocessed dataset is publicly available.

large language model, machine learning, natural language, (17 more...)

2511.092

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-10-2025

What Are the Facts? Automated Extraction of Court-Established Facts from Criminal-Court Opinions

Bendová, Klára, Knap, Tomáš, Černý, Jan, Pour, Vojtěch, Savelka, Jaromir, Kvapilíková, Ivana, Drápal, Jakub

Criminal justice administrative data contain only a limited amount of information about the committed offense. However, there is an unused source of extensive information in continental European courts' decisions: descriptions of criminal behaviors in verdicts by which offenders are found guilty. In this paper, we study the feasibility of extracting these descriptions from publicly available court decisions from Slovakia. We use two different approaches for retrieval: regular expressions and large language models (LLMs). Our baseline was a simple method employing regular expressions to identify typical words occurring before and after the description. The advanced regular expression approach further focused on "sparing" and its normalization (insertion of spaces between individual letters), typical for delineating the description. The LLM approach involved prompting the Gemini Flash 2.0 model to extract the descriptions using predefined instructions. Although the baseline identified descriptions in only 40.5% of verdicts, both methods significantly outperformed it, achieving 97% with advanced regular expressions and 98.75% with LLMs, and 99.5% when combined. Evaluation by law students showed that both advanced methods matched human annotations in about 90% of cases, compared to just 34.5% for the baseline. LLMs fully matched human-labeled descriptions in 91.75% of instances, and a combination of advanced regular expressions with LLMs reached 92%.

artificial intelligence, large language model, natural language, (19 more...)

2511.0532

Country:

North America > United States (0.46)
Europe > Czechia (0.29)

Genre: Research Report (1.00)

Industry: Law > Criminal Law (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceNov-10-2025

Mind the Gap... or Not? How Translation Errors and Evaluation Details Skew Multilingual Results

Peter, Jan-Thorsten, Vilar, David, Domhan, Tobias, Malkin, Dan, Freitag, Markus

In addition they have also shown impressive capabilities in different domains, like coding, science and math. In this short paper, taking math as an example domain, we study the performance of different LLMs across languages. Experimental results show that there exists a non-negligible and consistent gap in the performance of the models across languages. Interestingly, and somewhat against expectations, the gap exists for both high-and low-resource languages. We hope that these results influence further research into cross-lingual capability generalization for next generation LLMs. If it weren't for the fact that they are false! By analyzing one of the standard multilingual math benchmarks (MGSM), we determine that several translation errors are present in the data. Furthermore, the lack of standardized answer extraction from LLM outputs further influences the final results. We propose a method for automatic quality assurance to address the first issue at scale, and give recommendations to address the second one. Combining these two approaches we show that the aforementioned language gap mostly disappears, leading to completely different conclusions from our research. In recent years, large language models' capabilities have expanded in two primary directions: broader language coverage and enhanced performance on complex tasks. On the language dimension, it is now usual for LLMs to support not only high-resource languages languages (e.g. This is a very important and welcome progress direction in order to improve the inclusivity of AI applications and research.

large language model, machine learning, natural language, (19 more...)

2511.05162

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)